66 research outputs found
Contextual Bandits with Cross-learning
In the classical contextual bandits problem, in each round , a learner
observes some context , chooses some action to perform, and receives
some reward . We consider the variant of this problem where in
addition to receiving the reward , the learner also learns the
values of for all other contexts ; i.e., the rewards that
would have been achieved by performing that action under different contexts.
This variant arises in several strategic settings, such as learning how to bid
in non-truthful repeated auctions (in this setting the context is the decision
maker's private valuation for each auction). We call this problem the
contextual bandits problem with cross-learning. The best algorithms for the
classical contextual bandits problem achieve regret
against all stationary policies, where is the number of contexts, the
number of actions, and the number of rounds. We demonstrate algorithms for
the contextual bandits problem with cross-learning that remove the dependence
on and achieve regret (when contexts are stochastic with
known distribution), (when contexts are stochastic
with unknown distribution), and (when contexts are
adversarial but rewards are stochastic).Comment: 48 pages, 5 figure
Recommended from our members
Competition and Yield Optimization in Ad Exchanges
Ad Exchanges are emerging Internet markets where advertisers may purchase display ad placements, in real-time and based on specific viewer information, directly from publishers via a simple auction mechanism. The presence of such channels presents a host of new strategic and tactical questions for publishers. How should the supply of impressions be divided between bilateral contracts and exchanges? How should auctions be designed to maximize profits? What is the role of user information and to what extent should it be disclosed? In this thesis, we develop a novel framework to address some of these questions. We first study how publishers should allocate their inventory in the presence of these new markets when traditional reservation-based ad contracts are available. We then study the competitive landscape that arises in Ad Exchanges and the implications for publishers' decisions. Traditionally, an advertiser would buy display ad placements by negotiating deals directly with a publisher, and signing an agreement, called a guaranteed contract. These deals usually take the form of a specific number of ad impressions reserved over a particular time horizon. In light of the growing market of Ad Exchanges, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and the long-term impact of assigning high quality impressions to the reservations (typically measured by the click-through rate). In the first part of this thesis, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids, where the exchange bids are assumed to be exogenous and independent of the decisions of the publishers. We prove asymptotic optimality of this policy in terms of any arbitrary trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any Pareto-optimal point on the quality vs. revenue curve. In the second part of this thesis, we relax the assumption of exogenous bids in the Ad Exchange and study in more detail the competitive landscape that arises in Ad Exchanges and the implications for publishers' decisions. Typically, advertisers join these markets with a pre-specified budget and participate in multiple second-price auctions over the length of a campaign. We introduce the novel notion of a Fluid Mean Field Equilibrium (FMFE) to study the dynamic bidding strategies of budget-constrained advertisers in these repeated auctions. This concept is based on a mean field approximation to relax the advertisers' informational requirements, together with a fluid approximation to handle the complex dynamics of the advertisers' control problems. Notably, we are able to derive a closed-form characterization of FMFE, which we use to study the auction design problem from the publisher's perspective focusing on three design decisions: (1) the reserve price; (2) the supply of impressions to the Exchange versus an alternative channel such as bilateral contracts; and (3) the disclosure of viewers' information. Our results provide novel insights with regard to key auction design decisions that publishers face in these markets. In the third part of this thesis, we justify the use of the FMFE as an equilibrium concept in this setting by proving that the FMFE provides a good approximation to the rational behavior of agents in large markets. To do so, we consider a sequence of scaled systems with increasing market size;. In this regime we show that, when all advertisers implement the FMFE strategy, the relative profit obtained from any unilateral deviation that keeps track of all available information in the market becomes negligible as the scale of the market increases. Hence, a FMFE strategy indeed becomes a best response in large markets
Contextual Standard Auctions with Budgets: Revenue Equivalence and Efficiency Guarantees
The internet advertising market is a multi-billion dollar industry, in which
advertisers buy thousands of ad placements every day by repeatedly
participating in auctions. In recent years, the industry has shifted to
first-price auctions as the preferred paradigm for selling advertising slots.
Another important and ubiquitous feature of these auctions is the presence of
campaign budgets, which specify the maximum amount the advertisers are willing
to pay over a specified time period. In this paper, we present a new model to
study the equilibrium bidding strategies in standard auctions, a large class of
auctions that includes first- and second-price auctions, for advertisers who
satisfy budget constraints on average. Our model dispenses with the common, yet
unrealistic assumption that advertisers' values are independent and instead
assumes a contextual model in which advertisers determine their values using a
common feature vector. We show the existence of a natural value-pacing-based
Bayes-Nash equilibrium under very mild assumptions. Furthermore, we prove a
revenue equivalence showing that all standard auctions yield the same revenue
even in the presence of budget constraints. Leveraging this equivalence, we
prove Price of Anarchy bounds for liquid welfare and structural properties of
pacing-based equilibria that hold for all standard auctions. Our work takes an
important step toward understanding the implications of the shift to
first-price auctions in internet advertising markets
Single-Leg Revenue Management with Advice
Single-leg revenue management is a foundational problem of revenue management
that has been particularly impactful in the airline and hotel industry: Given
units of a resource, e.g. flight seats, and a stream of
sequentially-arriving customers segmented by fares, what is the optimal online
policy for allocating the resource. Previous work focused on designing
algorithms when forecasts are available, which are not robust to inaccuracies
in the forecast, or online algorithms with worst-case performance guarantees,
which can be too conservative in practice. In this work, we look at the
single-leg revenue management problem through the lens of the
algorithms-with-advice framework, which attempts to harness the increasing
prediction accuracy of machine learning methods by optimally incorporating
advice about the future into online algorithms. In particular, we characterize
the Pareto frontier that captures the tradeoff between consistency (performance
when advice is accurate) and competitiveness (performance when advice is
inaccurate) for every advice. Moreover, we provide an online algorithm that
always achieves performance on this Pareto frontier. We also study the class of
protection level policies, which is the most widely-deployed technique for
single-leg revenue management: we provide an algorithm to incorporate advice
into protection levels that optimally trades off consistency and
competitiveness. Moreover, we empirically evaluate the performance of these
algorithms on synthetic data. We find that our algorithm for protection level
policies performs remarkably well on most instances, even if it is not
guaranteed to be on the Pareto frontier in theory. Our results extend to other
unit-cost online allocations problems such as the display advertising and the
multiple secretary problem
Online Resource Allocation under Horizon Uncertainty
We study stochastic online resource allocation: a decision maker needs to
allocate limited resources to stochastically-generated sequentially-arriving
requests in order to maximize reward. At each time step, requests are drawn
independently from a distribution that is unknown to the decision maker. Online
resource allocation and its special cases have been studied extensively in the
past, but prior results crucially and universally rely on the strong assumption
that the total number of requests (the horizon) is known to the decision maker
in advance. In many applications, such as revenue management and online
advertising, the number of requests can vary widely because of fluctuations in
demand or user traffic intensity. In this work, we develop online algorithms
that are robust to horizon uncertainty. In sharp contrast to the known-horizon
setting, no algorithm can achieve even a constant asymptotic competitive ratio
that is independent of the horizon uncertainty. We introduce a novel
generalization of dual mirror descent which allows the decision maker to
specify a schedule of time-varying target consumption rates, and prove
corresponding performance guarantees. We go on to give a fast algorithm for
computing a schedule of target consumption rates that leads to near-optimal
performance in the unknown-horizon setting. In particular, our competitive
ratio attains the optimal rate of growth (up to logarithmic factors) as the
horizon uncertainty grows large. Finally, we also provide a way to incorporate
machine-learned predictions about the horizon which interpolates between the
known and unknown horizon settings
- …